Dataset statistics
| Number of variables | 12 |
|---|---|
| Number of observations | 10738 |
| Missing cells | 251 |
| Missing cells (%) | 0.2% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1006.8 KiB |
| Average record size in memory | 96.0 B |
Variable types
| NUM | 8 |
|---|---|
| CAT | 3 |
| BOOL | 1 |
Reproduction
| Analysis started | 2020-12-07 11:42:36.361190 |
|---|---|
| Analysis finished | 2020-12-07 11:43:19.279868 |
| Duration | 42.92 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
customer_stay_score is highly correlated with customer_ctr_score | High correlation |
customer_ctr_score is highly correlated with customer_stay_score | High correlation |
customer_id has unique values | Unique |
customer_visit_score has unique values | Unique |
customer_ctr_score has unique values | Unique |
customer_frequency_score has unique values | Unique |
customer_affinity_score has unique values | Unique |
| Distinct count | 10738 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 83.9 KiB |
| csid_6010 | 1 |
|---|---|
| csid_2717 | 1 |
| csid_3960 | 1 |
| csid_3746 | 1 |
| csid_8065 | 1 |
| Other values (10733) |
| Value | Count | Frequency (%) | |
| csid_6010 | 1 | < 0.1% | |
| csid_2717 | 1 | < 0.1% | |
| csid_3960 | 1 | < 0.1% | |
| csid_3746 | 1 | < 0.1% | |
| csid_8065 | 1 | < 0.1% | |
| csid_9298 | 1 | < 0.1% | |
| csid_1460 | 1 | < 0.1% | |
| csid_623 | 1 | < 0.1% | |
| csid_9945 | 1 | < 0.1% | |
| csid_2352 | 1 | < 0.1% | |
| Other values (10728) | 10728 | 99.9% |
Length
| Max length | 10 |
|---|---|
| Median length | 9 |
| Mean length | 8.965729186 |
| Min length | 6 |
| Distinct count | 10738 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 19.060941294733624 |
|---|---|
| Minimum | 0.5689647666895101 |
| Maximum | 47.30669098267679 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 83.9 KiB |
Quantile statistics
| Minimum | 0.5689647667 |
|---|---|
| 5-th percentile | 7.442481958 |
| Q1 | 13.51802134 |
| median | 18.77410921 |
| Q3 | 24.50171939 |
| 95-th percentile | 31.42445376 |
| Maximum | 47.30669098 |
| Range | 46.73772622 |
| Interquartile range (IQR) | 10.98369805 |
Descriptive statistics
| Standard deviation | 7.419609076 |
|---|---|
| Coefficient of variation (CV) | 0.389257223 |
| Kurtosis | -0.4065214262 |
| Mean | 19.06094129 |
| Median Absolute Deviation (MAD) | 5.439947497 |
| Skewness | 0.1014477924 |
| Sum | 204676.3876 |
| Variance | 55.05059884 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 8.206125143 | 1 | < 0.1% | |
| 12.30107767 | 1 | < 0.1% | |
| 9.818670235 | 1 | < 0.1% | |
| 20.23140584 | 1 | < 0.1% | |
| 23.18956459 | 1 | < 0.1% | |
| 18.97509644 | 1 | < 0.1% | |
| 22.67105567 | 1 | < 0.1% | |
| 29.17374567 | 1 | < 0.1% | |
| 23.56993992 | 1 | < 0.1% | |
| 19.33703771 | 1 | < 0.1% | |
| Other values (10728) | 10728 | 99.9% |
| Value | Count | Frequency (%) | |
| 0.5689647667 | 1 | < 0.1% | |
| 0.6441806855 | 1 | < 0.1% | |
| 0.6650534717 | 1 | < 0.1% | |
| 0.715215517 | 1 | < 0.1% | |
| 0.9186268439 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 47.30669098 | 1 | < 0.1% | |
| 43.92674833 | 1 | < 0.1% | |
| 43.75726982 | 1 | < 0.1% | |
| 42.34256741 | 1 | < 0.1% | |
| 42.19495825 | 1 | < 0.1% |
customer_product_search_score
Real number (ℝ)
| Distinct count | 10696 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 42 |
| Missing (%) | 0.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.27484715286252 |
|---|---|
| Minimum | -0.16193998183198755 |
| Maximum | 16.63824329516359 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 83.9 KiB |
Quantile statistics
| Minimum | -0.1619399818 |
|---|---|
| 5-th percentile | 2.262566301 |
| Q1 | 3.971586843 |
| median | 5.218479286 |
| Q3 | 6.520363539 |
| 95-th percentile | 8.386104872 |
| Maximum | 16.6382433 |
| Range | 16.80018328 |
| Interquartile range (IQR) | 2.548776696 |
Descriptive statistics
| Standard deviation | 1.882558586 |
|---|---|
| Coefficient of variation (CV) | 0.3568934855 |
| Kurtosis | 0.545163275 |
| Mean | 5.274847153 |
| Median Absolute Deviation (MAD) | 1.276309404 |
| Skewness | 0.2892716474 |
| Sum | 56419.76515 |
| Variance | 3.54402683 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 5.800739967 | 1 | < 0.1% | |
| 6.580534098 | 1 | < 0.1% | |
| 7.556778851 | 1 | < 0.1% | |
| 4.6302368 | 1 | < 0.1% | |
| 3.796770983 | 1 | < 0.1% | |
| 2.641433 | 1 | < 0.1% | |
| 4.683022025 | 1 | < 0.1% | |
| 5.47722877 | 1 | < 0.1% | |
| 7.885387429 | 1 | < 0.1% | |
| 4.698243006 | 1 | < 0.1% | |
| Other values (10686) | 10686 | 99.5% | |
| (Missing) | 42 | 0.4% |
| Value | Count | Frequency (%) | |
| -0.1619399818 | 1 | < 0.1% | |
| -0.04875713064 | 1 | < 0.1% | |
| 0.0644344974 | 1 | < 0.1% | |
| 0.08783044642 | 1 | < 0.1% | |
| 0.1758818048 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 16.6382433 | 1 | < 0.1% | |
| 16.63088664 | 1 | < 0.1% | |
| 15.51932408 | 1 | < 0.1% | |
| 14.65319495 | 1 | < 0.1% | |
| 14.64986837 | 1 | < 0.1% |
| Distinct count | 10738 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.17591198060116714 |
|---|---|
| Minimum | -0.5479890837946332 |
| Maximum | 2.6794742421447224 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 83.9 KiB |
Quantile statistics
| Minimum | -0.5479890838 |
|---|---|
| 5-th percentile | -0.07250584954 |
| Q1 | 0.01084001214 |
| median | 0.07407813627 |
| Q3 | 0.1596064355 |
| 95-th percentile | 1.072821684 |
| Maximum | 2.679474242 |
| Range | 3.227463326 |
| Interquartile range (IQR) | 0.1487664233 |
Descriptive statistics
| Standard deviation | 0.3728289383 |
|---|---|
| Coefficient of variation (CV) | 2.119406177 |
| Kurtosis | 10.96033251 |
| Mean | 0.1759119806 |
| Median Absolute Deviation (MAD) | 0.07104629897 |
| Skewness | 3.216021049 |
| Sum | 1888.942848 |
| Variance | 0.1390014172 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0.1085098466 | 1 | < 0.1% | |
| 0.01584065775 | 1 | < 0.1% | |
| 0.06493509666 | 1 | < 0.1% | |
| -0.1197484193 | 1 | < 0.1% | |
| 0.1597885255 | 1 | < 0.1% | |
| 0.1577491043 | 1 | < 0.1% | |
| 0.1069763926 | 1 | < 0.1% | |
| 0.06193393741 | 1 | < 0.1% | |
| -0.02782028104 | 1 | < 0.1% | |
| 0.3495371962 | 1 | < 0.1% | |
| Other values (10728) | 10728 | 99.9% |
| Value | Count | Frequency (%) | |
| -0.5479890838 | 1 | < 0.1% | |
| -0.5462274631 | 1 | < 0.1% | |
| -0.5384683288 | 1 | < 0.1% | |
| -0.5339414237 | 1 | < 0.1% | |
| -0.5324857093 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2.679474242 | 1 | < 0.1% | |
| 2.571238413 | 1 | < 0.1% | |
| 2.57043864 | 1 | < 0.1% | |
| 2.510406547 | 1 | < 0.1% | |
| 2.390943097 | 1 | < 0.1% |
| Distinct count | 10701 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 37 |
| Missing (%) | 0.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.37423006184720775 |
|---|---|
| Minimum | -0.4624940639254821 |
| Maximum | 14.701914171298233 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 83.9 KiB |
Quantile statistics
| Minimum | -0.4624940639 |
|---|---|
| 5-th percentile | -0.1002052956 |
| Q1 | -0.02766573337 |
| median | 0.03720079496 |
| Q3 | 0.1790287653 |
| 95-th percentile | 2.441966972 |
| Maximum | 14.70191417 |
| Range | 15.16440824 |
| Interquartile range (IQR) | 0.2066944986 |
Descriptive statistics
| Standard deviation | 1.222030798 |
|---|---|
| Coefficient of variation (CV) | 3.265453321 |
| Kurtosis | 29.79532391 |
| Mean | 0.3742300618 |
| Median Absolute Deviation (MAD) | 0.08274564002 |
| Skewness | 5.008726307 |
| Sum | 4004.635892 |
| Variance | 1.493359272 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| -0.1012199454 | 1 | < 0.1% | |
| 0.1751477254 | 1 | < 0.1% | |
| 0.04995203118 | 1 | < 0.1% | |
| -0.04555211684 | 1 | < 0.1% | |
| 0.0813363266 | 1 | < 0.1% | |
| -0.02048455937 | 1 | < 0.1% | |
| 0.1500162078 | 1 | < 0.1% | |
| -0.06963437964 | 1 | < 0.1% | |
| -0.004254199539 | 1 | < 0.1% | |
| 0.1594164456 | 1 | < 0.1% | |
| Other values (10691) | 10691 | 99.6% | |
| (Missing) | 37 | 0.3% |
| Value | Count | Frequency (%) | |
| -0.4624940639 | 1 | < 0.1% | |
| -0.3895174278 | 1 | < 0.1% | |
| -0.3724808355 | 1 | < 0.1% | |
| -0.3532560254 | 1 | < 0.1% | |
| -0.3492770074 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 14.70191417 | 1 | < 0.1% | |
| 14.28113287 | 1 | < 0.1% | |
| 13.53972037 | 1 | < 0.1% | |
| 12.40849733 | 1 | < 0.1% | |
| 11.99354174 | 1 | < 0.1% |
| Distinct count | 10738 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.376894687891762 |
|---|---|
| Minimum | 0.028575210510020193 |
| Maximum | 52.39501392251049 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 83.9 KiB |
Quantile statistics
| Minimum | 0.02857521051 |
|---|---|
| 5-th percentile | 0.1542461741 |
| Q1 | 0.3136096545 |
| median | 0.5168299359 |
| Q3 | 1.125379515 |
| 95-th percentile | 14.11275303 |
| Maximum | 52.39501392 |
| Range | 52.36643871 |
| Interquartile range (IQR) | 0.8117698602 |
Descriptive statistics
| Standard deviation | 5.601910934 |
|---|---|
| Coefficient of variation (CV) | 2.356819157 |
| Kurtosis | 19.19443114 |
| Mean | 2.376894688 |
| Median Absolute Deviation (MAD) | 0.2615149293 |
| Skewness | 4.083012882 |
| Sum | 25523.09516 |
| Variance | 31.38140612 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0.7903460794 | 1 | < 0.1% | |
| 1.41808916 | 1 | < 0.1% | |
| 0.691165902 | 1 | < 0.1% | |
| 0.729271526 | 1 | < 0.1% | |
| 10.02961875 | 1 | < 0.1% | |
| 0.8282612463 | 1 | < 0.1% | |
| 3.901457606 | 1 | < 0.1% | |
| 0.111588142 | 1 | < 0.1% | |
| 0.3752963629 | 1 | < 0.1% | |
| 0.4475662319 | 1 | < 0.1% | |
| Other values (10728) | 10728 | 99.9% |
| Value | Count | Frequency (%) | |
| 0.02857521051 | 1 | < 0.1% | |
| 0.03332008134 | 1 | < 0.1% | |
| 0.0355902314 | 1 | < 0.1% | |
| 0.03591151507 | 1 | < 0.1% | |
| 0.03660536941 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 52.39501392 | 1 | < 0.1% | |
| 49.67938001 | 1 | < 0.1% | |
| 49.03419464 | 1 | < 0.1% | |
| 47.81685008 | 1 | < 0.1% | |
| 46.92130903 | 1 | < 0.1% |
customer_product_variation_score
Real number (ℝ≥0)
| Distinct count | 10692 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 46 |
| Missing (%) | 0.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.788179528639336 |
|---|---|
| Minimum | 2.7528361476216268 |
| Maximum | 18.743835720199684 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 83.9 KiB |
Quantile statistics
| Minimum | 2.752836148 |
|---|---|
| 5-th percentile | 3.541244833 |
| Q1 | 4.193234472 |
| median | 4.842574595 |
| Q3 | 6.286400327 |
| 95-th percentile | 11.66568404 |
| Maximum | 18.74383572 |
| Range | 15.99099957 |
| Interquartile range (IQR) | 2.093165855 |
Descriptive statistics
| Standard deviation | 2.531309458 |
|---|---|
| Coefficient of variation (CV) | 0.4373239367 |
| Kurtosis | 3.191873393 |
| Mean | 5.788179529 |
| Median Absolute Deviation (MAD) | 0.8261130491 |
| Skewness | 1.851646948 |
| Sum | 61887.21552 |
| Variance | 6.40752757 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 12.22809216 | 1 | < 0.1% | |
| 7.397833399 | 1 | < 0.1% | |
| 9.47163936 | 1 | < 0.1% | |
| 4.058211693 | 1 | < 0.1% | |
| 10.22522605 | 1 | < 0.1% | |
| 5.442010193 | 1 | < 0.1% | |
| 6.452078811 | 1 | < 0.1% | |
| 5.647095354 | 1 | < 0.1% | |
| 3.682175325 | 1 | < 0.1% | |
| 7.820523341 | 1 | < 0.1% | |
| Other values (10682) | 10682 | 99.5% | |
| (Missing) | 46 | 0.4% |
| Value | Count | Frequency (%) | |
| 2.752836148 | 1 | < 0.1% | |
| 2.787879879 | 1 | < 0.1% | |
| 2.812295598 | 1 | < 0.1% | |
| 2.821770078 | 1 | < 0.1% | |
| 2.822925888 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 18.74383572 | 1 | < 0.1% | |
| 18.48790753 | 1 | < 0.1% | |
| 18.42971169 | 1 | < 0.1% | |
| 18.3418688 | 1 | < 0.1% | |
| 18.26678114 | 1 | < 0.1% |
customer_order_score
Real number (ℝ≥0)
| Distinct count | 10672 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 66 |
| Missing (%) | 0.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.150070538556626 |
|---|---|
| Minimum | 0.36333795012621 |
| Maximum | 9.09020550869893 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 83.9 KiB |
Quantile statistics
| Minimum | 0.3633379501 |
|---|---|
| 5-th percentile | 1.532564209 |
| Q1 | 2.454017385 |
| median | 3.118394172 |
| Q3 | 3.756566397 |
| 95-th percentile | 4.892306349 |
| Maximum | 9.090205509 |
| Range | 8.726867559 |
| Interquartile range (IQR) | 1.302549012 |
Descriptive statistics
| Standard deviation | 1.03541551 |
|---|---|
| Coefficient of variation (CV) | 0.3286959759 |
| Kurtosis | 1.210741347 |
| Mean | 3.150070539 |
| Median Absolute Deviation (MAD) | 0.6528132847 |
| Skewness | 0.5768648974 |
| Sum | 33617.55279 |
| Variance | 1.072085278 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 3.61768905 | 1 | < 0.1% | |
| 4.211795453 | 1 | < 0.1% | |
| 3.771272483 | 1 | < 0.1% | |
| 3.185724287 | 1 | < 0.1% | |
| 4.350577461 | 1 | < 0.1% | |
| 2.6436982 | 1 | < 0.1% | |
| 2.527028663 | 1 | < 0.1% | |
| 3.047355745 | 1 | < 0.1% | |
| 2.231138913 | 1 | < 0.1% | |
| 3.328788488 | 1 | < 0.1% | |
| Other values (10662) | 10662 | 99.3% | |
| (Missing) | 66 | 0.6% |
| Value | Count | Frequency (%) | |
| 0.3633379501 | 1 | < 0.1% | |
| 0.5371367527 | 1 | < 0.1% | |
| 0.5610723909 | 1 | < 0.1% | |
| 0.5692797177 | 1 | < 0.1% | |
| 0.5997546164 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 9.090205509 | 1 | < 0.1% | |
| 8.951938748 | 1 | < 0.1% | |
| 8.937619861 | 1 | < 0.1% | |
| 8.357390523 | 1 | < 0.1% | |
| 8.226249469 | 1 | < 0.1% |
| Distinct count | 10738 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 17.06183578563719 |
|---|---|
| Minimum | -0.4868340562827102 |
| Maximum | 248.55275470161067 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 83.9 KiB |
Quantile statistics
| Minimum | -0.4868340563 |
|---|---|
| 5-th percentile | -0.08343869615 |
| Q1 | 4.530085389 |
| median | 12.65335707 |
| Q3 | 23.11457668 |
| 95-th percentile | 50.46467763 |
| Maximum | 248.5527547 |
| Range | 249.0395888 |
| Interquartile range (IQR) | 18.58449129 |
Descriptive statistics
| Standard deviation | 18.76269336 |
|---|---|
| Coefficient of variation (CV) | 1.099687841 |
| Kurtosis | 16.85754965 |
| Mean | 17.06183579 |
| Median Absolute Deviation (MAD) | 8.987608416 |
| Skewness | 2.993483837 |
| Sum | 183209.9927 |
| Variance | 352.0386622 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 18.51900811 | 1 | < 0.1% | |
| 25.13272248 | 1 | < 0.1% | |
| 29.40032001 | 1 | < 0.1% | |
| 7.766693473 | 1 | < 0.1% | |
| 14.96031205 | 1 | < 0.1% | |
| 15.74434363 | 1 | < 0.1% | |
| 23.27878962 | 1 | < 0.1% | |
| 12.63416693 | 1 | < 0.1% | |
| 12.29702508 | 1 | < 0.1% | |
| 20.1937733 | 1 | < 0.1% | |
| Other values (10728) | 10728 | 99.9% |
| Value | Count | Frequency (%) | |
| -0.4868340563 | 1 | < 0.1% | |
| -0.482894096 | 1 | < 0.1% | |
| -0.473328527 | 1 | < 0.1% | |
| -0.4556253366 | 1 | < 0.1% | |
| -0.4542975823 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 248.5527547 | 1 | < 0.1% | |
| 246.9369655 | 1 | < 0.1% | |
| 218.4587702 | 1 | < 0.1% | |
| 206.6697283 | 1 | < 0.1% | |
| 198.923264 | 1 | < 0.1% |
customer_active_segment
Categorical
| Distinct count | 5 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 23 |
| Missing (%) | 0.2% |
| Memory size | 83.9 KiB |
| C | |
|---|---|
| B | |
| D | 536 |
| AA | 418 |
| A | 412 |
| Value | Count | Frequency (%) | |
| C | 4919 | 45.8% | |
| B | 4430 | 41.3% | |
| D | 536 | 5.0% | |
| AA | 418 | 3.9% | |
| A | 412 | 3.8% | |
| (Missing) | 23 | 0.2% |
Length
| Max length | 3 |
|---|---|
| Median length | 1 |
| Mean length | 1.043211026 |
| Min length | 1 |
X1
Categorical
| Distinct count | 5 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 37 |
| Missing (%) | 0.3% |
| Memory size | 83.9 KiB |
| BA | |
|---|---|
| A | |
| F | |
| AA | |
| E | 76 |
| Value | Count | Frequency (%) | |
| BA | 4511 | 42.0% | |
| A | 2268 | 21.1% | |
| F | 2235 | 20.8% | |
| AA | 1611 | 15.0% | |
| E | 76 | 0.7% | |
| (Missing) | 37 | 0.3% |
Length
| Max length | 3 |
|---|---|
| Median length | 2 |
| Mean length | 1.577016204 |
| Min length | 1 |
customer_category
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 83.9 KiB |
| 0 | |
|---|---|
| 1 | 1295 |
| Value | Count | Frequency (%) | |
| 0 | 9443 | 87.9% | |
| 1 | 1295 | 12.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| customer_id | customer_visit_score | customer_product_search_score | customer_ctr_score | customer_stay_score | customer_frequency_score | customer_product_variation_score | customer_order_score | customer_affinity_score | customer_active_segment | X1 | customer_category | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | csid_1 | 13.168425 | 9.447662 | -0.070203 | -0.139541 | 0.436956 | 4.705761 | 2.537985 | 7.959503 | C | F | 0 |
| 1 | csid_2 | 17.092979 | 7.329056 | 0.153298 | -0.102726 | 0.380340 | 4.205138 | 4.193444 | 17.517381 | C | A | 0 |
| 2 | csid_3 | 17.505334 | 5.143676 | 0.106709 | 0.262834 | 0.417648 | 4.479070 | 3.878971 | 12.595155 | C | BA | 0 |
| 3 | csid_4 | 31.423381 | 4.917740 | -0.020226 | -0.100526 | 0.778130 | 5.055535 | 2.708940 | 4.795073 | AA | F | 0 |
| 4 | csid_5 | 11.909502 | 4.237073 | 0.187178 | 0.172891 | 0.162067 | 3.445247 | 3.677360 | 56.636326 | C | AA | 0 |
| 5 | csid_6 | 9.007922 | 7.051568 | 0.161564 | 0.040997 | 0.191935 | 4.209840 | 3.181961 | 18.862680 | C | BA | 0 |
| 6 | csid_7 | 13.707109 | 5.625179 | 0.009634 | -0.019998 | 0.177622 | 4.165093 | 4.689834 | 109.203352 | B | E | 0 |
| 7 | csid_8 | 32.042122 | 3.563568 | -0.050730 | NaN | 0.257060 | 4.366761 | 4.041260 | 24.036321 | AA | A | 0 |
| 8 | csid_9 | 20.434181 | 5.111682 | 0.133922 | 0.036893 | 0.442314 | 4.759516 | 3.407424 | 17.078123 | C | BA | 0 |
| 9 | csid_10 | 13.778214 | 3.829299 | 0.159102 | 0.165818 | 0.558187 | 6.255980 | 3.315462 | 9.443864 | B | BA | 0 |
Last rows
| customer_id | customer_visit_score | customer_product_search_score | customer_ctr_score | customer_stay_score | customer_frequency_score | customer_product_variation_score | customer_order_score | customer_affinity_score | customer_active_segment | X1 | customer_category | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10728 | csid_10729 | 24.772317 | 4.753238 | 0.019578 | -0.070097 | 0.556264 | 4.590020 | 3.126145 | 12.193862 | C | A | 0 |
| 10729 | csid_10730 | 11.657455 | 6.233353 | 0.007517 | -0.016122 | 0.476700 | 4.024655 | 2.727740 | 22.214286 | B | A | 0 |
| 10730 | csid_10731 | 18.793887 | 4.199826 | 0.143747 | 0.219525 | 0.267384 | 3.867731 | 2.893106 | 28.685574 | B | BA | 0 |
| 10731 | csid_10732 | 29.094167 | 6.391500 | -0.051283 | -0.079743 | 0.434865 | 4.791949 | 2.244512 | 6.251333 | B | BA | 0 |
| 10732 | csid_10733 | 14.664036 | 5.341811 | 0.043920 | -0.125090 | 0.269019 | 4.563034 | 3.685176 | 14.066261 | C | A | 0 |
| 10733 | csid_10734 | 23.672615 | 6.701514 | 0.092879 | -0.017332 | 1.210397 | 7.003663 | 3.027084 | 1.952911 | C | BA | 0 |
| 10734 | csid_10735 | 25.673028 | 6.497796 | 0.050216 | -0.047211 | 0.725230 | 5.407507 | 3.104172 | 5.124286 | C | BA | 0 |
| 10735 | csid_10736 | 31.676844 | 7.799880 | 0.062961 | -0.032765 | 0.318118 | 5.598486 | 2.403051 | 21.864188 | A | BA | 0 |
| 10736 | csid_10737 | 28.441780 | 5.588302 | -0.093931 | 0.081586 | 0.132177 | 3.616492 | 4.972243 | 86.969977 | B | AA | 0 |
| 10737 | csid_10738 | 20.663035 | 4.478301 | 0.253165 | 0.381349 | 0.504904 | 4.181092 | 4.469215 | 27.770899 | B | A | 0 |